Fix/export workspace path override#114
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a keyword-only ChangesWorkspace Path Override, Export Engine Fixes, and Workspace Listing Caching
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
services/workspace_listing.py (1)
93-100: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftKeep project-cache hits from doing full orchestration work.
prepare_workspace_orchestration()resolves workspace context and display maps beforeget_cached_projects(), but cache hits only need the fingerprint. Consider splitting fingerprint preparation from full orchestration or making the context/display-map fields lazy so cached workspace listings return cheaply.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/workspace_listing.py` around lines 93 - 100, The issue is that prepare_workspace_orchestration() performs expensive work resolving workspace context and display maps before checking get_cached_projects() for a cache hit. Restructure the logic to compute only the fingerprint first (before full orchestration), check get_cached_projects() with that fingerprint, and only invoke the complete prepare_workspace_orchestration() if the cache check misses, avoiding unnecessary work when returning cached results.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/export.py`:
- Around line 171-180: In the _read_last_export_ms function, after loading the
JSON with json.load(f) into the st variable, add a type validation check to
ensure st is a dictionary before calling st.get("lastExportTime"). If st is not
a dictionary (for example, if the JSON file contains an array, string, or
number), return 0 to treat it as unreadable state and fall back to full export.
You can either add an isinstance(st, dict) check before the st.get() call or add
AttributeError to the exception handler tuple to catch this case.
In `@services/export_engine.py`:
- Around line 371-382: The createdAt value from meta.get("createdAt") is used
directly without normalization in the assignment to created_ms, but this value
may be in seconds, an ISO string, or milliseconds depending on the source. This
causes issues in the max() comparison with db_mtime_ms (which is in
milliseconds) and elsewhere in the code like messages_to_bubbles() and filename
timestamps. Normalize the createdAt value to milliseconds before assigning it to
created_ms by adding logic that detects and converts seconds timestamps, ISO
strings, or other formats to milliseconds before the fallback to
datetime.now().timestamp() * 1000.
- Around line 217-225: The exception handler for json.loads(row["value"]) in the
try-except block is missing TypeError, which can be raised when row["value"] is
a non-string type like None or int. Update the except clause that currently
catches (json.JSONDecodeError, ValueError) to also include TypeError in the
tuple of exceptions, ensuring that malformed non-string values are logged and
skipped consistently with other similar error handling patterns in the codebase.
---
Nitpick comments:
In `@services/workspace_listing.py`:
- Around line 93-100: The issue is that prepare_workspace_orchestration()
performs expensive work resolving workspace context and display maps before
checking get_cached_projects() for a cache hit. Restructure the logic to compute
only the fingerprint first (before full orchestration), check
get_cached_projects() with that fingerprint, and only invoke the complete
prepare_workspace_orchestration() if the cache check misses, avoiding
unnecessary work when returning cached results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f08194c8-1cc4-4c57-919a-1b3fabf9336f
📒 Files selected for processing (10)
api/export_api.pypyproject.tomlscripts/export.pyservices/export_engine.pyservices/workspace_listing.pytests/test_api_export.pytests/test_export_base_dir_override.pytests/test_export_engine.pytests/test_workspace_path_thread_safety.pyutils/workspace_path.py
💤 Files with no reviewable changes (1)
- pyproject.toml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
services/workspace_listing.py (1)
96-118: 🚀 Performance & Scalability | 🔵 TrivialDuplicated fingerprint computation introduces a latent cache-key divergence risk.
The inline block (lines 100–108) recomputes
global_storage_db_path,get_cli_chats_path, andfingerprint_workspace_storageidentically to howprepare_workspace_orchestrationdoes so internally (export_engine.py lines 129–135). On a cache miss this duplicates I/O, and more critically, the read key (fingerprint, line 109) and write key (orch.fingerprint, line 124) are produced by two separate copies of the same logic. They are byte-identical today, but if either site's conditional gating (e.g.,os.path.isfile/os.path.isdir) or therulesargument ever change, cache reads and writes would key on different fingerprints, silently breaking the cache.Consider threading the precomputed fingerprint into
prepare_workspace_orchestrationto maintain a single source of truth while preserving the cheap cache-hit short-circuit before the expensivectx/display-map resolution.♻️ Sketch: pass the computed fingerprint through
orch = prepare_workspace_orchestration( workspace_path, rules, nocache=effective_nocache, workspace_entries=workspace_entries, + fingerprint=fingerprint if not effective_nocache else None, )This requires adding a keyword-only
fingerprint: dict[str, Any] | None = Noneparameter toprepare_workspace_orchestrationthat, when provided, is used instead of recomputing. That keeps a single fingerprint definition and removes the duplicate stat lookups on the cache-miss path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/workspace_listing.py` around lines 96 - 118, The fingerprint computation in the inline block (lines 100-108) duplicates the identical logic that prepare_workspace_orchestration performs internally, creating a cache-key divergence risk if either implementation changes. Add a keyword-only fingerprint parameter to prepare_workspace_orchestration with a default value of None, then modify the function to use the provided fingerprint when available instead of recomputing it. In the calling code within workspace_listing.py, pass the computed fingerprint value to prepare_workspace_orchestration to eliminate the duplication while preserving the cache-hit short-circuit before expensive orchestration operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@services/workspace_listing.py`:
- Around line 96-118: The fingerprint computation in the inline block (lines
100-108) duplicates the identical logic that prepare_workspace_orchestration
performs internally, creating a cache-key divergence risk if either
implementation changes. Add a keyword-only fingerprint parameter to
prepare_workspace_orchestration with a default value of None, then modify the
function to use the provided fingerprint when available instead of recomputing
it. In the calling code within workspace_listing.py, pass the computed
fingerprint value to prepare_workspace_orchestration to eliminate the
duplication while preserving the cache-hit short-circuit before expensive
orchestration operations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa81b06d-e9a9-448b-abd6-114fc4760a91
📒 Files selected for processing (3)
scripts/export.pyservices/export_engine.pyservices/workspace_listing.py
🚧 Files skipped from review as they are similar to previous changes (2)
- services/export_engine.py
- scripts/export.py
14c256f to
dbec81e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
services/export_engine.py (1)
386-392: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winGuard
session["meta"]shape before.get()calls.At Lines 386-392, if
metais present but not a dict (orNone), this raisesAttributeErrorbefore the session-level read/parse guard, causing the whole export to fail on one bad session.Suggested fix
- meta = session.get("meta", {}) + raw_meta = session.get("meta") + meta = raw_meta if isinstance(raw_meta, dict) else {} session_id = session["session_id"] created_raw = meta.get("createdAt") created_ms = to_epoch_ms(created_raw) if created_raw else int( datetime.now().timestamp() * 1000, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/export_engine.py` around lines 386 - 392, The code retrieves `meta` from the session dictionary but does not validate that it is actually a dictionary before calling `.get()` on it. If the session contains a non-dict value for the meta key (such as None or a string), calling `meta.get("createdAt")` or `meta.get("name")` will raise an AttributeError. Add a type guard to check that `meta` is a dictionary before using its `.get()` methods, defaulting it to an empty dictionary if it is not the correct type, so that subsequent `.get()` calls operate safely without raising AttributeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/export_engine.py`:
- Line 233: The composer_id extraction on row["key"].split(":")[1] will raise an
IndexError if the key doesn't contain a colon, causing the entire export to
abort instead of gracefully handling malformed rows. Add validation before
splitting to check if the key contains the colon separator, and implement proper
error handling to skip malformed rows or provide a fallback value so the export
can continue processing remaining rows without interruption.
---
Duplicate comments:
In `@services/export_engine.py`:
- Around line 386-392: The code retrieves `meta` from the session dictionary but
does not validate that it is actually a dictionary before calling `.get()` on
it. If the session contains a non-dict value for the meta key (such as None or a
string), calling `meta.get("createdAt")` or `meta.get("name")` will raise an
AttributeError. Add a type guard to check that `meta` is a dictionary before
using its `.get()` methods, defaulting it to an empty dictionary if it is not
the correct type, so that subsequent `.get()` calls operate safely without
raising AttributeError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f1e18b2d-d529-43cd-873f-05d7721079cc
📒 Files selected for processing (5)
api/export_api.pyservices/export_engine.pyservices/workspace_listing.pytests/test_export_base_dir_override.pyutils/workspace_path.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/test_export_base_dir_override.py
- utils/workspace_path.py
- services/workspace_listing.py
Closes #107
Summary
Replaces the CLI
--base-dirworkspace-path bypass that mutatedos.environ["WORKSPACE_PATH"]with an explicitresolve_workspace_path(override=...)parameter. Stacks on Monday PR 1 (refactor/export-engine-consolidation); the core override plumbing landed there during review — this PR adds dedicated regression tests and a doc comment fix.Sprint item Chen § #4 (3 pt, Medium).
Changes
utils/workspace_path.py—resolve_workspace_path(*, override: str | None = None); call-time override takes precedence over module override, env, and default; does not mutateWORKSPACE_PATHscripts/export.py—resolve_workspace_path(override=opts.get("base_dir"));os.environ["WORKSPACE_PATH"]mutation removedtests/test_export_base_dir_override.py(new) — asserts--base-diris passed asoverride=and leavesWORKSPACE_PATHunchangedtests/test_workspace_path_thread_safety.py— explicit-override precedence test (from PR 1 stack)Out of scope
WORKSPACE_PATHlock semanticsAcceptance criteria
scripts/export.pypasses--base-direxplicitly toresolve_workspace_path()instead of mutatingos.environos.environ["WORKSPACE_PATH"]mutation line removedresolve_workspace_path()accepts optionaloverrideparameterTest plan
mypy --strict utils/workspace_path.py scripts/export.pypytest tests/test_export_base_dir_override.py tests/test_workspace_path_thread_safety.py tests/test_cli_export_e2e.py -qpytest -k "workspace_path or export or thread" -qpython scripts/export.py --base-dir <workspaceStorage> --helpMerge note
Base branch:
refactor/export-engine-consolidation(PR 1 must merge first, or merge both in order).Summary by CodeRabbit
Bug Fixes
Improvements
--base-dirhandling to correctly override workspace path resolution without permanently changingWORKSPACE_PATH.Tests
--base-diroverride behavior and environment variable preservation.